1 Introduction

1.1 Abstract

  • The challenge - Analyse and improve (using provided datasets) the loan application process. Design innovative ideas to conduct analysis and develop sound and executable business solutions to enhance the customer loan application experience.
  • The data - is a real life event log of the loan and overdraft approvals process from a bank. The log consists of 262,200 events and 13,087 cases.
  • Event Log - log is provided in XES format, the XES format is a standard defined by the IEEE task force on process mining.

1.2 Business Process Mining

Process mining is one hot top which has attracted intense analysis in recent years and has a broad range of applications across different industries.
Process Mining is a process analysis method that aims to discover, monitor and improve real processes by extracting knowledge easily from available event logs in the systems of current information of an organization. an image caption Source: What’s Process Mining?

1.3 Tooling — bupaR

an image caption Source: Log of bupaR http://www.bupar.net/index.html
bupaR is an open-source, integrated suite of R-packages for the handling and analysis of business process data. It currently consists of 8 packages, including the central package, supporting different stages of a process mining workflow.
an image caption Source: Image of bupaR

2 Preparations

2.1 Load libraries

3 Understanding of the Data

Overview of the eventlog data

## Log of 262200 events consisting of:
## 13087 cases 
## 262200 instances of 24 activities 
## 69 resources 
## Events occurred from 2011-09-30 22:38:44 until 2012-03-14 15:04:54 
##  
## Variables were mapped as follows:
## Case identifier:     CASE_concept_name 
## Activity identifier:     activity_id 
## Resource identifier:     resource_id 
## Activity instance identifier:    activity_instance_id 
## Timestamp:           timestamp 
## Lifecycle transition:        lifecycle_id 
## 
## # A tibble: 262,200 x 9
##    CASE_concept_na… CASE_AMOUNT_REQ CASE_REG_DATE activity_id lifecycle_id
##    <chr>            <chr>           <chr>         <fct>       <fct>       
##  1 173688           20000           2011-10-01T0… A_SUBMITTED COMPLETE    
##  2 173688           20000           2011-10-01T0… A_PARTLYSU… COMPLETE    
##  3 173688           20000           2011-10-01T0… A_PREACCEP… COMPLETE    
##  4 173688           20000           2011-10-01T0… W_Complete… SCHEDULE    
##  5 173688           20000           2011-10-01T0… W_Complete… START       
##  6 173688           20000           2011-10-01T0… A_ACCEPTED  COMPLETE    
##  7 173688           20000           2011-10-01T0… O_SELECTED  COMPLETE    
##  8 173688           20000           2011-10-01T0… A_FINALIZED COMPLETE    
##  9 173688           20000           2011-10-01T0… O_CREATED   COMPLETE    
## 10 173688           20000           2011-10-01T0… O_SENT      COMPLETE    
## # … with 262,190 more rows, and 4 more variables: resource_id <fct>,
## #   timestamp <dttm>, activity_instance_id <chr>, .order <int>

Extract objects from Event Log

## There are  13087  cases;  24  unique activities;  69   unique resources;  4366   unique traces;

3.1 View at case level (A case represent a process of one loan application)

CASE_concept_name trace_length number_of_activities start_timestamp complete_timestamp trace trace_id duration_in_days first_activity last_activity final_status
173697 3 3 2011-10-01 06:11:08 2011-10-01 06:11:46 A_SUBMITTED,A_PARTLYSUBMITTED,A_DECLINED 1 0.0004347 A_SUBMITTED A_DECLINED DECLINED
173700 3 3 2011-10-01 06:15:39 2011-10-01 06:16:21 A_SUBMITTED,A_PARTLYSUBMITTED,A_DECLINED 1 0.0004762 A_SUBMITTED A_DECLINED DECLINED
173703 9 5 2011-10-01 07:45:25 2011-10-01 11:02:12 A_SUBMITTED,A_PARTLYSUBMITTED,A_PREACCEPTED,W_Completeren aanvraag,W_Completeren aanvraag,W_Completeren aanvraag,W_Completeren aanvraag,A_CANCELLED,W_Completeren aanvraag 1897 0.1366522 A_SUBMITTED W_Completeren aanvraag CANCELLED
173727 3 3 2011-10-01 10:08:46 2011-10-01 10:09:30 A_SUBMITTED,A_PARTLYSUBMITTED,A_DECLINED 1 0.0005058 A_SUBMITTED A_DECLINED DECLINED
173733 6 4 2011-10-01 10:39:34 2011-10-01 12:54:56 A_SUBMITTED,A_PARTLYSUBMITTED,W_Afhandelen leads,W_Afhandelen leads,A_DECLINED,W_Afhandelen leads 2927 0.0940082 A_SUBMITTED W_Afhandelen leads DECLINED

## The application process is always started with A_SUBMITTED, and can be end with 11 different activities, include A_DECLINED/A_CANCELLED/A_REGISTERED/

3.2 View at activity level

3.2.1 Deep understanding of event type

Application Events (A_)
Refers to states of the application itself.
* A_SUBMITTED / A_PARTLYSUBMITTED - Initial application submission
* A_PREACCEPTED - Application pre-accepted but requires additional information
* A_ACCEPTED - Application accepted and pending screen for completeness
* A_FINALIZED - Application finalized after passing screen for completeness
* A_APPROVED / A_REGISTERED / A_ACTIVATED - End state of successful (approved) applications
* A_CANCELLED / A_DECLINED - End states of unsuccessful applications

Offer Events (O_)
Refers to states of an offer communicated to the customer.
* O_SELECTED - Applicant selected to receive offer
* O_PREPARED / O_SENT - Offer prepared and transmitted to applicant
* O_SENT BACK - Offer response received from applicant
* O_ACCEPTED - End state of successful offer
* O_CANCELLED / O_DECLINED - End states of unsuccessful offers

Work item Events (W_)
Refers to states of work items that occur during the approval process.These events capture most of the manual effort exerted by Bank’s resources during the application approval process. The events describe efforts during various stages of the application process.
* W_Afhandelen leads - Following up on incomplete initial submissions
* W_Completeren aanvraag - Completing pre-accepted applications
* W_Nabellen offertes - Follow up after transmitting offers to qualified applicants
* W_Valideren aanvraag - Assessing the application
* W_Nabellen incomplete dossiers - Seeking additional information during assessment phase
* W_Beoordelen fraude - Investigating suspect fraud cases
* W_Wijzigen contractgegevens - Modifying approved contracts

Names and Descriptions of Transitions in the Work Item Life Cycle * SCHEDULE - Indicates a work item has been scheduled to occur in the future * START - Indicates the opening / commencement of a work item * COMPLETE - Indicates the closing / conclusion of a work item

3.4 View at trace level

## There are  4366 unique traces; Let's look at the most frequent ones. (coverage = 50%)
## 26.2% applications are declined directly; 14.3% applications are declined beacuse the applicant can't(or not able to) complete the online application

4 Understanding the Process in Detail

4.1 The whole process map

The process looks dizzying。

4.2 Process at application level

Let’s only look at the process at application level:

From the above diagram we can know:
* A_PARTLYSUBMITTED is redundant of A_SUBMITTED
* A_ACTIVATED, A_APPROVED, A_REGISTERED are redundant and messy, all of them happened 2246 times, but the sequence is confused following by A_FINALIZED
* Total cases (13087) = 2807(A_CANCELLED) + 2246(A_Succeed) + 7635(A_DECLINED) + 69(A_PREACCEPTED) + 3(A_ACCEPTED) + 327(A_FINALIZED)

We can deep dive into the process step by step (application level activities) to get more detialed information, e.g. the following diagram shows processes between A_PARTLYSUBMITTED and A_PREACCEPTED

According to the conclusion above, we can simplify the process map as below by collapsing the reduntant activities:

The following diagram clearly shows the direct relationship betweeb two activities

Accordingly, I have the following diagram to simplely show the standard process flow at application level and the number of cases at each phase: an image caption Source: What’s Process Mining?

4.3 Declined case

Now, I will deep dive into the declined cases, to discover the reasons of decline. From the above chart, we can know 5719 out of 7635 cases are declined immediately right after submission online.
Let’s check the other declined cases:

From the above chart, we know:
* 3429 was declined right after application submit
* 2234 was declined after ‘W_Afhandelen leads’ (Following up on incomplete initial submissions)
* 25 was declined after ‘A_ACCEPTED’
* 57 was declined after ‘W_Beoordelen fraude’ (Investigating suspect fraud cases)
* 1088 was declined after W_Completeren aanvraag (Completing pre-accepted applications)
* 86 was declined after W_Nabellen incomplete dossiers (Seeking additional information during assessment phase)
* 668 was declined after W_Valideren aanvraag (Assessing the application)
* 48 was declined after W_Nabellen offertes (Follow up after transmitting offers to qualified applicants)

Illustrate as: an image caption Source: What’s Process Mining?

In General, we can get the following chart to clearly show the applications final status - only 2246 out of 13087 got succeed.

5 Conclusions

Through comprehensive analysis of the event log, we managed to convert a data set containing 262,200 events and 13,087 cases into a clearly interpretable, end-to-end workflow for a loan and overdraft approvals process. I suggest the improvements:
* 1. Simplify the process by removing reduntant / mixed activities, e.g. A_SUBMITTED / A_PARTLYSUBMITTED; A_APPROVED / A_REGISTERED / A_ACTIVATED;
* 2. Totally there are 4366 traces, from the trace explorer, 50% of the traces are covered by 12 types, and longest one which contains only 14 activities. There are space to optimize the process.
* 3. Refine the automation assessment process after submitting application online; Totally 58% cases are declined, 44% are directly declined.
* 4. Duration_in_days of declined application is 2.0 days, however the duration_in_days of succeed and cancelled application is 16.7/18.5 days.

6 Heuristics Miner

Heuristics Miner is an algorithm that acts on the Directly-Follows Graph, providing way to handle with noise and to find common constructs (dependency between two activities, AND). The output of the Heuristics Miner is an Heuristics Net, so an object that contains the activities and the relationships between them.

6.1 Mining of the dependency graph

A frequency based metric is used to indicate how certain that there is truly a dependency relation between two events/activities A and B (notation A ⇒W B). Let W be an event log over T, and a, b ∈ T. Then |a >W b| is the number of times a >W b occurs in W, and:
an image caption Source: Calculation of dependency. An sample to explain the mathematical formula
If we use this definition in the situation that, in 5 traces, activity A is directly followed by activity B but the other way around never occurs, the value of A ⇒W B = 5/6 = 0.833 indicating that we are not completely sure of the dependency relation (only 5 observations possibly caused by noise).
However if there are 50 traces in which A is directly followed by B but the other way around never occurs, the value of A ⇒W B = 50/51 = 0.980 indicates that we are pretty sure of the dependency relation.

Dependency graph of application process

Causal graph / Heuristics net